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Abstract 

Two neural learning controller designs for manipulators are considered. The first 
Hpcig n is based on a neural inverse-dynamics system. The second is the combination of the 
first one with a neural adaptive state feedback system. Both types of controllers enable the 
manipulator to perform any given task very well after a period of training, and to do other 
tin tr ain ed tasks satisfactorily. The second design also enables the manipulator to 
compensate for unpredictable perturbations. 

1. Introduction 

The design of advanced control systems for robot manipulators has been a very active 
area of research in recent years. Inadequacy of current control strategies suggests that 
there is a need for a newer and faster control architecture which will account for both 
learning and control of robotic manipulators. 

In classical systems theory, input-output descriptions are based on some assumed or 
predetermined mathematical structures, normally a set of linear differential equations. 
Replacement of these predetermined structures by learned associative memory mappings of 
stimulus-response leads to more general, normally non-linear, representations of the 
rannertinng between inputs and outputs. This procedure can be implemented by neural 
networks [1], The best example of a system with such an architecture is the human brain, 
which performs many complex functions superbly. 

In the problem of motor control, obtaining an input function u(t) to generate a desired 
motion y(t) is directly related to finding the inverse-dynamics of the controlled system. Let 
the operator ft denote the dynamics relation of the system, where G(u)=y. Then the 
inverse-dynamics of the system is the operator £=6-”* such that £(y)=u. Knowing the 
inverse-dynamics relation £=6 ^, for a given desired motion trajectory yd, the required 

input ty can be found from ud=£(yd)' This is because the motion corresponding to ud is 
equal to y=6(ud>= CK£(yd) 

It has been shown that multi-layer neural networks with sigmoidal functions are able to 
map any measurable function to another with an arbitrary degree of accuracy, provided that 
there are enough units in their hidden layers. Therefore, such networks can be used for 
approximating the model of the inverse of the dynamics of a system [2-10]. In this paper the 
development of neuromorphic learning controllers is considered. First a recurrent neural 

network learning controller C is designed. The design has a neural inverse-dynamics block 
£ and a PD-lype feedback block Jt. Next the learning controller C is modified, where its 
PD-type feedback block is replaced by a neural adaptive state feedback block H, which is to 
optimally compensate for unpredictable perturbations. The architectures of these learning 
controllers are similar to those in [101 which are inspired by the model of the cerebellum 
given by Kawato [5-6]. 
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2. Robot Dynamics 


The dynamics of a robot manipulator can be represented by an operator ft which 
corresponds to a set of n coupled nonlinear differential equations, given by 

M(q)q“ + N(q,q') + Q(q) = u (l a ) 

or ft(u) = q (lb) 

where q, q', and q" are n-dimensional vectors of the positions, velocities, and accelerations of 
the joints, respectively, where "prime" denotes the time-derivative. M(q) is the nxn inertia 
matrix of the arm, which is symmetric and positive definite. N(q,q') is the n-dimensional 
vector of coriolis, centrifugal, and frictional forces. Q(q) is the n-dimensional vector 
representing the torques due to gravitational forces, and u is the n-dimensional vector of the 
generalized input torques applied to the robot. 

3. Learning Controller Design 

There are a variety of algorithms which can be used for multi-layer neural networks to 
learn the mapping between two patterns [1], However, the state of the art learning 
algorithms are most effective when the input-output patterns are fixed. This condition, in 
general, is not satisfied when the objective error function is not identical to the error function 
at the neural network's output layer. To satisfy this condition we observe the following. 

Lemma 1 

Consider a stable system given by the operator ft as in Figure 1, where its output q is 
desired to follow a reference function <*.. Let the high gain feedback block given by the linear 

operator H be such that the closed-loop is stable and that Ml »1. Then for 

bounded input v the output error e=q r -q is bounded and is given by e=[( I +GH)‘ 1 ft)( Sv)^j£‘ ^ ( Sv), 

where 5v=r-v. Moreover, the feedback signal Su=H(e)~Sv. 


Proof 



Figure 1 


From Figure 1, by some block manipulation, it is easy to see that e=f(I-tft§Q-lft}(Sv), 
where Sv=r-Y. Now letv be bounded. Then, since r exists, Sv is also bounded. But since the 
closed-loop system (I+ftH)'l& is stable, the error signal e=qj.-q is also bounded. Now since 

M »1, we get e=4t'^(Sv). On the other hand, since H is linear, we have 
Su=[( I +6JC)~ 1 GJCK S v) . But again, since |laHfl » 1 , it is easy to see that 8u=JC(e)^Sv. 0 
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4. Neural Inverse-Dynamics Model for Learning Control 

The learning controller C, shown in Figure 2, has only one neural network block £ to 

approximate the inverse-dynamics model. There is also a feedback block ft, of the PD-type, 
which is used for both the neural learning and the error compensation, and is given by 

Jt(e)= Su = Kpe + K^e' . (2) 



Figure 2 


Network's Architecture 

The neural block £used here is essentially a recurrent multi-layer neural network. The 
input-output relation of the neural network £is given by 

x'= Aj g(x) + Bj e (3) 

v=Cig(x) 

where e = [(j.T q' r T q" r T l]T e Jt 3n+1 , x t Jl N , and v t fc n are respectively the vectors of the 
network's input, states, and outputs. Aj, Bj, and Cj are respectively the matrices of the 
network's state recurrence, input and output connection weights, and g is the sigmoidal 

function given by g(x)=tanh(x). The unity input in vector e is added to allow for the automatic 
adjustment of the bias term. 

Network's Learning Rule 

The learning algorithm used for the network is a modification of the delta rule [1], and is 
given by [11] 

a'i,ij = <*i Su T Vg(x) Ci r\ (4) 

b’l,ik = M uT Vg(x) Ci ^ l ik 
c lj>j = ^1 Su pg( x j) 
n’Uj =AiVg(x) r\ lM + Ii g(Xj) 

^'l,ik =Al Vg(x) + ! i 0k 
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where Ii is the ith column of the identity matrix, Su is the feedback torque which is also the 

network's output error, Vg(xMg(x)/9x is the Jacobian matrix, and oq, and are the 
learning rate constants which are small positive numbers. The initial values of matrices 
Aj, Bj, and Ci are selected randomly between -0.2 and 0.2, and tq ^(0)=^ lk (0)=0, 

The objective of the learning controller C is to force the system's output error to zero 
through repeated trials of the desired task. During trials, when the reference input is 
repeatedly applied to the system, the system's output error is used to adjust the controller 

parameters, which are the connection weights of the neural network block £ Therefore, the 

feedforward block £ is modified in such a way to force the feedback torque to vanish, which 
indirectly decreases the robot's output error. When the error becomes small, learning has 

been accomplished and the neural network block £is said to have acquired the model of the 
inverse-dynamics of the robot. But for this, the corresponding learning algori thm must be 
convergent, or, the dynamics of the learning system must be asymptotically stable. 

Result 1 

Consider the robotic manipulator given by the operator 6> as in equation (1). Let the 
neural learning controller C given by equations (2) and (3) be applied to the system, as shown 
in Figure 2. Let the feedback block H be such that the closed-loop system is stable 

and that |l3§t| »1 . Then the neural learning controller C, together with the learning rule (4) 
is asymptotically stable. That is, the proposed learning controller forces the manipulator's 
trajectory q, q‘, to follow the desired trajectory q' r , after a sufficiently long period of time 

Proof 

Let *1 1 jj=3x/3ai ij and ^ i i,ik- then from equation (3), we get [1 1] 

* ~ Jo 1 [Aig(x) +Bi e]dr 

= Jo 1 ^l,!] [Aig(x)+Bi 0 ] dt = J 0 t [Ai VgMtq^+Ij g(x,)] dT 
£l,ik = Iff ^^l,ik [Alg(x)+Bj 0 ] dT = J 0 t [Aj Vg(x)£^k+Ij 0 ^] dT. 

Differentiating the above two relationships, we get 

*1 ' ' Ui = A 1 v ^ x ) n l,i, + h g(xp (5) 

4l,ik = Aj7g(x) Zi ik + Ii e^. 

Now, without loss of generality, we asume that there exists an input function r(t) to the 
manipulator such that <fr=fi(r). Let a performance function for the learning process of the 
neural inverse-dynamics network be defined by 

Jl(t) = 0.5 [r(t)-v(t)]T [rOHt)] = 0.5 Sr-2 . (6) 

Since Jj(t) is positive definite and monotonicalty increasing, for asymptotic stability. 
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must be negative definite. But the time derivative of Ji(t) is given by 


J’l(t) = Sr^ 8(Sr) /9t C 7 ) 

= - SrT [(^Sryaai^j) a' Uj + 0(SrV3b lik ) b' 1#ik + 0(Sr)/3c lj( j) c' ljPj ] 

= -SrT[Ci Vg(x)r iUj a , i /ij + Ci *8M*i j|k b' Mk + I p g(Xj)c' ljPj ]. 

On the other hand, since |Sf(| »L from Lemma 1 we have Su=8r . Therefore we have 
J’ 1 (t) = -SuT[Ci 7g(x)ti Uj a' Uj + Ci Vg(x) Supg(Xj)c' ljPi . (8) 


However, for a' i^j, b‘ i^, and c'j^j given by equation (4), we get 


J , 1 (t) = -a 1 [SuTCi Vg(x)n Uj ] 2 -^ [Su T Ci Vg(x) ^ lfik ]2-J 1 [Su p g(x j )]2 (9) 

which is a negative definite scalar function, except when we have Su-0 where the learning is 
complete. Therefore, from the second method of Liapunov, the learning controller Cwith the 
weight adjustments given by equation (4), is asymptotically stable (i.e., it is convergent). 
That is, the connection weight matrices Aj, Bi, and Ci in the neural inverse-dynamics block 

£ will be adjusted until Jj(t)=0, that is when 8u=u-v=0 or equivalently when Sr=r-v=0. 

However, since the feedback operator is linear, Su=Kpe+K ( ie , =0 implies that e=e =0, since e 
anH e‘ are linearly independent Therefore, q=tfc>, and q‘=q' r as time t approaches infinity 

(i.e., the manipulator's trajectory q, q' follow the desired trajectory qp q' r ). □ 

The neural network £ part of the controller C is able to acquire the model of the 
inverse-dynamics of the manipulator after a sufficiently long period of training. After this, 

the robot with the inverse-dynamics block £ alone (i.e., without the error feedback block Jt), 
is able to perform the trained tasks very well. In addition, the robot is able to perform some 

new tasks satisfactorily. However, without the feedback block H, the robot is not quite able to 
compensate for unpredictable perturbations. It is easily seen, however, that leaving block K 
in the controller loop after the period of training greatly improves the ability of the controller 
to compensate for pertuibations. This is the motivation for the next design. 

5. Neural Adaptive State Feedback Model for Learning Control 

The learning controller 0 in this section contains both a feedforward and a feedback 
neural network block. The feedback neural block Jt in this design has substituted for the 
PD- type error feedback block, as in Figure 3. 

The neural adaptive state feedback block H is intended as an optimal state feedback 

controller, and contains two sub-networks. One is the dynamics identifier D, which realizes 
the dynamics model of the system's perturbation about the nominal operating point. The 
other is the state feedback f, which generates an optimal state feedback for disturbance 
compensation. The overall feedback network It learns to generate the optimal state feedback 
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torques to eliminate perturbations. From the Linear Quadratic Control Theory, this network 
is equivalent to an optimal state feedback which continuously identifies the parameters of the 
perturbation dynamics of the manipulator, and from these, produces the optimum 
compensating torques. 


Learning Controller C 



Figured 


Feedback Network's Architecture 

The input to neural block ft is a 2 n-vector of the angular position velocity errors e and e 1 . 
The outputs of the network are the n compensating torque signals Su. 

The input-output relationship for the dynamics identifier network D is given by 

V' * A 2 g(y) + B 2 p (10) 

£*C2g(y) 

where v » [Su^ 1]T c Jt.n+ 1 , y £ JlH <y'T]T t ft 2 n are respectively the input state, and 

the output of the network. A2, B2, and C2 are respectively the matrices of the network's state 
recurrence, input and output connection weights. 

The input-output relationship for the state feedback network F is given by 

z‘ * A3 g(z) + B3 e (11) 

Sv = C3g(z) 


where e=[eT, e 'T, 1]T £ Jl. 2 n+ 1 ^ z c 8v t St n are respectively the input state, and the output 
of the network, and A3, B3, Q3 are respectively the matrices of the network's state 
recurrence, input and output connection weights. As shown in Figure 3 , there are some 

internal feedback blocks JC and Lwithin the neural adaptive state feedback block ft which are 
used primarily to provide a performance function for the networks' learning algorithm. 




(12) 


That is 

8u = Sv + p, 

ji = 1C((g) = Kp cp + Kd <p' 
tp = g_+ X 

X = s Lp £ + Ld £ 

where ^=[<p T ,<p* T ]T <y-T]T X=[X T >’ T ] T , £=[£ T ,£ ,T ] T =[(e-<?) T Xe , -^ , ) T ] T . X is a 

linear high gain feedback operator, and Lis a linear feedback gain block. 

In the feedback block H, the neural dynamics identifier 1 ) approximates the input-output 
relationship of the dynamics of perturbations by forcing its outputs to follow the system 

errors e and e'. The neural state feedback f, on the other hand, approximates the 
input-output relationship of an optimal state error feedback ^stem by forcing its output to 
follow the input of the neural dynamics identifier D. This, in effect, adjusts block ¥ to 
approximate the inverse of the neural dynamics identifier D. 

Networks' Le arning Rules 

The laming algorithm used for the neural dynamics identifier network D is similar to 
that of inverse-dynamics network £> i.e., the time derivative of the connection weight 
matrices and C2 are given by [11] 

a 2 M m <*2 £ Vg(y) C2 r|2,ij ^ 

b > 2,ik-P2£ T ?g(y) °2*2,ik 
c, 2j>j = J 2^g<yj) 
n'2^j *A 27 g^) tl24j + 

^ 24 k =A 2Vg(y) ?2,ik + p k 


where £=[£ T ,£ T ] T is the network's output error, Vg(y)= 3 g(y)/ 3 y is the Jacobian matrix, and 
<* 2 ' ^2' an< ^ *2 are sma H positive learning rate constants. The initial values of matrices A2, 

B2, and C2 are selected randomly between -0.2 and 0.2, and ^k^^ ■ 

The learning scheme for the neural state feedback network block T is similar; i.e., the 
♦imp derivative of the connection weight matrices A3, B3, C3, are given by 

a 3 >j = a 3 R T Vg(z) C 3 Y] 3 ^ (I 4 ) 

b Vk = ^3l iT v g( z ^ C 3 ^3,ik 
c‘3j>j = *3 h 

n’3>i = A 3 vg(z) ti3,ij + I ig< z j) 

^3,ik = A 3 Vg(z) ^ik + li §k 
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where |i^Su-Sv is the network's output error, vg(z)=3g(z)/3z is the Jacobian matrix, and <* 3 , 
£ 3 , and &3 are small positive learning rate constants. The initial values of matrices A 3 , B 3 , 
and C 3 are selected randomly between - 0.2 and 0 . 2 , and qj ^( 0 )=^ ^(OHD. 

From Result 1, the inverse-dynamics neural network £ with its learning rule is able to 

realize the model of the inverse-dynamics (irl of the robot and to generate the required robot 
torque corresponding to the desired trajectory qj. and q' r . From the Linear Quadratic Control 

Theory, in order to generate the compensating torque corresponding to the dynamics 
perturbations about the nominal trajectory of the robot the adaptive state feedback neural 

network H must identify the dynamics relation of the perturbations and correspondingly 
generate the optimal feedback according to some performance criterion. But for this the 
corresponding learning algorithm must be convergent (i.e., the dynamics of the learning 
system must be asymptotically stable) . 

Result 2 

Consider the robotic manipulator given by the operator & as in equation (1). Assume that 
the neural learning controller C, given by equations (3) and (10-12), is applied to the system, 
as shown in Figure 3. Let the feedback operator Lbe a unify gain. Also let the high gain 
feedback block Kbe such that the closed-loop system (I+CSJQ'lftis stable and that *X| » 1 , 
Then the neural learning controller C, together with the learning rales (4) and (13-14) is 

asymptotically stable. That is, the learning controller C forces the manipulator's trajectory q 
and q’ to follow the desired trajectory qj. and q' r after a sufficiently long time. 

Proof 

From Result 1, since L is a unify gain, JC is such that (I+£S)Q~1& is stable, and » 1 , 
the learning process for the neural inverse-dynamics network £is asymptotically stable. 

Now let y 34 j =9z/3a3^ j , and ^j= 3 z/ 3 b 3 >k. Then, similar to 

the proof of Result 1, from the neural network's dynamics equations (10-1 1), we get 


n 2,i) - A 2 Vgfy) + l i gfrj) (15) 

^ , 24k = A 2?g(y)^24k +I i p k- 

n , 34j = A 3Vg(z)ti3, i j + I ig(Zj) 

£*3,ik = A 3 ?g(z) £3 ^ + Ii Ufc. 


Now considering the convergence of the feedback block H, let a performance function for 
the learning process of the dynamics identifier sub-network Dbe defined by 

J2(t) = 0.5£(t) T £(t) (16) 

where £«f£^,£’T]T an d £«e-< 7 . Since J^t) is positive definite and monotonically increasing, 
for asymptotic stability, J' 2 (t) must be negative definite. Using the chain rule, we get 
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(17) 


J‘2(t O-fiT *£.■)« 

= - £ T [C 2 7g(y) ti2>j a 2,ij + °2 Vg(y) ^ 2/ ik b '2,ik + J p g^) c '2jpj]- 
However, for the weight adjustments given by equation (13), we have 

J' 2 (t) = -a 2 [£ T C 2 7g(y) 2 ' ^2 1 °2 Vg(y) ^ 2 ,ik 3 2 " *2 1 £p g<?j)] 2 < 18 > 

which is a negative definite scalar function, except when £=0 where learning is complete. 
Similarly, let the learning performance function for neural block f be defined by 

J3(t)=0.5tft)Ttft) (19) 


where £ = Sr-Sv is the output error of the network, 8r=r-y, and r is such that G(r)=q r . 
Again, since J&t) is positive definite and monotonically increasing, for asymptotic stability, 
J'g(t) must be negative definite. Similar to the earlier case, by the chain rule, we have 

J - 3W=? T 3(a/3t (20) 

= -? T [C3V«(z)r|3^» , 34) + C 3 Vgfe)? 3Ak b' 3A + I„ g(Zj)c' 3fj ]. 

On the other hand, since loci »1, from Lemma 1 we have Su=8r and hence 
Therefore, for the weight adjustments given by equation (14), we get 

J'3(t) = - «3 [ |i T C3 Vg(z) n 3 ^] 2 • P3 1 C3 Vg(z) 2 - «3 [ lip g(Zj)l 2 ( 21 ) 

which is a negative definite scalar function, except when p^O where learning is complete. 
Therefore from the second method of Liapunov, this learning system is asymptotically stable. 
This means that the connection weights in the networks will be adjusted until J2(t)=0 and 

J3(t)=0, or equivalently £=e-o=0, £‘=e'-<7'=0, and p^8u-Sv=0. However, these imply that 

e=e‘=0, and that the dynamics identifier sub-network ©acquires the model of the dynamics of 
the perturbation system. Also, the optimal state feedback sub-network ¥ becomes identical 
to the inverse ©"1 of the dynamics identifier sub-network ©, which generates the 
compensating torque corresponding to the trajectory perturbation e and e‘. Therefore, q=qj- 

and q‘=q' r as time t approaches infinily (i.e., the manipulator's trajectory q,q‘ follow the 
desired trajectory q^q'r). □ 

6. Conclusion 

In this paper, two neural learning controller designs have been considered. They mimic 
the functions of the cerebellum for the learning and control of voluntary movements and they 
have parallel processing capabilities which make them fast and adaptable. The designs 
have several promising attributes that make them very feasible solutions to current problems 
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in Robotics. Most importantly, such controllers are able to approximate the model of the 
inverse-dynamics of the robot during the training period. This allows the robot to learn 
repetitive motions almost perfectly. But even above that it can perform tasks that it has not 
been trained to do yet and to perform them well. In addition, the second design has a good 
adaptation capability which allows the controller to compensate for unexpected disturbances. 

Another advantage of these designs is that they do not require knowledge of the system 
parameters, and they are robust with respect to parameter variation and disturbances under 
a variety of tasks. Finally, the parallel processing property of these architectures makes 
them highly suitable for the integration of a multitude of sensory information into the motion 
controller networks. 
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